Two-stage mural image restoration using an edge-constrained attention mechanism

The current mainstream image restoration methods have difficulty fully learning the structure and color information of murals in mural image restoration tasks due to the limited size of the available datasets, resulting in problems such as structural loss and texture errors. This study proposes a two-stage mural restoration network based on an edge-constrained attention mechanism. This paper introduces additional sketches as inputs during the coarse restoration phase and incorporates a local edge loss function to enable the network to generate corresponding structural information based on the sketches. In the fine restoration phase, the calculation for the similarity between missing areas and known areas is optimized to enhance the consistency of the restoration results with the texture of the known areas. Furthermore, a structure-guided attention propagation block is introduced after adopting the attention mechanism. This block selectively integrates surrounding contextual information to update the attention score map, thereby enhancing the coherence and plausibility of the generated textures. The experimental results show that the proposed method outperforms the current mainstream restoration methods according to various assessment indices. The proposed method generates high-quality structural information according to user guidance information, and the repaired texture is highly visually consistent with that of the original mural, with few noticeable deviations. This study provides a new approach for mural restoration, which may positively impact cultural heritage protection and artistic restoration applications.


Introduction
Ancient murals contain a large amount of historical and cultural information that thoroughly reflects the artistic style of each period and how this style evolved; moreover, such murals reflect the historical integration of Chinese and Western art [1][2][3].However, due to weathering and human destruction, many murals have fractured, faded or been damaged in other ways.Traditional mural restoration methods usually involve directly painting these cultural relics.However, the workload of these methods is high, the restoration time is long, and the restoration method is irreversible, which may cause further mural damage.In contrast, digital restoration methods are typically low risk, highly efficient and enable permanent preservation.Therefore, the use of digital image restoration technology for virtual restoration is a topic of interest among both domestic and foreign scholars.
Traditional methods for image restoration can be primarily either based on texture block matching or sparse representations.These methods are usually based on diffusion or image block matching mechanisms [4][5][6][7].With this type of method, the low-level features of an undamaged area are transferred to the damaged area, and favorable results have been obtained for images with small damaged areas, simple structures or repeated textures.However, with this approach, the damaged areas are repaired based on the local features of a single image; thus, the true content cannot be expressed, and the semantics of the repair results are unreasonable.
As deep learning develops, its notable generalizability has provided a new idea for image repair, which can be adopted to repair the overall image content at the semantic level.The context encoder [8] is one of the earliest deep learning methods used for semantic image restoration.In this method, an autoencoder structure is used to repair the defect area by minimizing the pixel-level reconstruction loss and adversarial loss.However, the context encoder method has several limitations, including the information bottleneck introduced by the fully connected layer and the lack of constraints on the local image area, resulting in obvious visual artifacts in the inpainting area of the output image.To address the information bottleneck problem of the context encoder method, Iizuka et al. [9] reduced the number of downlink sampling layers and applied a series of dilated convolutional layers to replace the fully connected layer.In addition, they introduced a local discriminator to improve the quality of the restored image.Yang et al. [10] introduced coarse-to-fine convolutional network schemes for image inpainting.Yan et al. [11] added a shift connection layer based on U-Net [12] and then guided the encoder to repair the features of the unknown region via feature fusion.Yu et al. [13] introduced the attention mechanism in the fine repair stage.In their method, features that are similar to those of the region to be repaired are extracted from the known region by the context attention layer, and the damaged region is then constructed according to the similarity scores of the blocks to ensure that the network is guided by full-text information.Liu et al. [14] proposed the coherent semantic attention (CSA) mechanism, which took into consideration the association between the known and damaged regions but that between the internal units within the missing region.Their method effectively addresses the issues of internal faults and distortions in the repair area, which improves image repair effect to a new high level.Although these methods can generate relatively fine textural details, structural distortions may occur.Nazeri et al. [15] proposed a two-stage restoration method that first predicts and generates contour lines to obtain complete boundary information and then employs this information to guide image color filling.This method addresses the structural distortion problem to some extent; however, this approach has difficulty directly predicting the complete contours of the missing regions for murals with complex structures.Furthermore, incorrect structural information generated by the model cannot be modified by the user as needed.To introduce user prior knowledge to guide repair, Portenier et al. [16] proposed the Faceshop model, which combines image generation with image translation; in this model, external sketches are employed to achieve end-to-end training.Yu et al. [17] replaced the traditional convolution with the gated convolution to distinguish masks and user-provided sketch information.The network generates the corresponding image content according to the sketch information.Ren et al. [18] introduced gated convolutions into Thang-ga image restoration and integrated image edge information extracted by the Canny algorithm as an extra input during training.Their method improved the restoration of structural information in mural images.To further enhance restoration quality, Liu et al. [19] proposed the mural image restoration model MPR-Net, which uses multistage progressive inference from global to local receptive fields to address the issue of semantic singularity in recurrent networks, achieving detailed texture restoration.
In the past three years, attention-based image restoration models [20,21] have been further developed, which generate images with more refined textures.Li et al. [22] proposed a cyclic feature inference network, which embeds the attention mechanism into the recurrent feature reasoning (RFR) module; the resulting model can be generalized by repeatedly using the parameters of the RFR module.In addition, they introduced the knowledge-consistent attention module to fuse the attention score in an adaptive manner, which gradually improves the feature map, leading to better texture restoration effects in a wide range of missing areas.Cao et al. [23] developed a multilevel attention propagation-driven image restoration network.This network compresses high-level features extracted from full-resolution images into multiscale compact features and then uses these compact features for multilevel attention feature propagation based on the scale of the features, enabling the propagation of advanced features (including structural and detail features) in the network.Chen et al. [24] proposed an image restoration model based on the combination of structure-constrained attention (SCA) and a pixel-level CAM module, which introduces CSA in the coarse repair stage to enhance semantic-level restoration and pixel-level CA in the fine repair stage to enhance pixel-level restoration.All these methods leverage the coarse repair results of the damaged area and the true values of the known area; however, these approaches cannot precisely determine the pixel relationship between the damaged and known areas.Thus, when these approaches are applied for mural image restoration, noticeable texture restoration deviations can occur.
Therefore, a two-stage mural restoration network based on edge-constrained attention propagation is proposed in this study.The main contributions can be summarized as follows: 1.The existing methods for feature extraction may produce model outputs that do not satisfy the requirements for sketch information.In view of this issue, a local structure loss function, L edge , is proposed.In addition, the cross-entropy loss is used to constrain the distance between the pixel distribution of the generated structural information and that of the prior structural information, thus guiding the generator to generate reasonable and clear structures based on the structural information provided by the user.
2. The existing methods employing attention mechanism calculate similarity between the generated results of the to-be-repaired region and the original information from the known region.This approach demands stringent criteria for achieving satisfactory outcomes in first-stage repairs, often leading to texture errors in the final repair results.To mitigate this issue, an SCA module is designed, which enables the attention score to more accurately reflect the degree of correlation between the known and unknown regions, thereby improving the texture consistency between the generated and known regions.
3. For mural images with complex structures, first-stage repair may not accurately infer the correct repair result based on semantic information, potentially resulting in errors in the final repair result.To address this challenge, a structural-constrained attention propagation (SCAP) layer is designed, which enables the network to infer and update attention scores based on sketch structural information, thereby generating more reasonable semantic information.

Overall network design
The network repair process and overall structure of the two-stage mural image restoration method based on the proposed SCA mechanism are shown in Fig 1 .The restoration process of the model consists of two main stages: coarse and fine repair.In the coarse repair stage, channel mosaicking is performed for I masked (the image to be repaired), I s (the sketch containing the structural information of the missing areas) and I m (the mask of the area to be repaired); then, image I fg is obtained through generator G1, and the areas in this image are all generated.
Next, I fg is cropped and added to the damaged area of I masked , yielding I fp .In the fine repair stage, I fp is processed by generator G2-1, with one part used as the input for G2-2 and the other part combined with I s' (obtained through 4× downsampling of the bilinear interpolation of I s ) and I m' (obtained through 4× downsampling of the bilinear interpolation of I m ), which is used as the input to the lower branch of the SCA module.The outputs of the two branches are spliced, and the final restoration result I c is obtained through generator G2-3.The main components of the network include generator G1, generator G2 and discriminator D. The structure of G1 is summarized in Table 1.The upper branch of G2 has the same structure as G1, the lower branch of G2 includes the edge-constrained SCA module (this module is detailed in a later section), and the discriminator D is composed of five 5×5 ordinary convolutions with a step size of 2, with output channel numbers of 48, 96, 192, 192 and 1.  Coarse repair network.The role of the coarse repair network is to fill the missing area with the appropriate structure and texture and determine the similarity between the missing and known areas, which is used by the fine repair network.The network includes multiple downsampling modules, and dilated convolutions are added to expand the receptive field of the network, thereby allowing the network to focus on the known global information.To address the issue that ordinary and partial convolutions cannot consider the pixels in the missing area as invalid pixels and cannot process the sketch information as effective pixels, gated convolutions are introduced to replace the ordinary convolutions.By using gated convolutions, the influence of missing regions is reduced in the feature extraction process, and the guidance information of the sketches is highlighted.The gated convolution can be expressed as follows: where I denotes the input, which is the feature after downsampling in the network; W g and W f are the convolution filters used to calculate the gating values and eigenvalues, respectively; ϕ is the rectified linear unit (ReLU) activation function; and σ is the sigmoid activation function, which outputs the gating value within the range of [0,1].Through the gating value, the user guidance information and missing area information can be separated to highlight the guidance information of the sketch and ensure the accuracy of the structural information in the repaired region.
Fine repair network.The fine repair network is used to further optimize the coarse repair results, eliminate the texture smoothing problem from the coarse repair stage, and ensure consistency between the repaired and known areas.The fine repair network uses two parallel repair branches that consider structural and textural information.Deep learning methods have difficulty achieving fine texture restoration due to issues such as small mural datasets, complex colors in murals, and differences between the distributions of the strong internal features of the images.In this paper, the SCA mechanism is proposed, which obtains the correlation between the missing and known regions by generating a coarse result.An attention propagation block based on structural constraints is added to update the correlation, and finally, fine texture details are reconstructed.
Discriminator.The spectral-normalized PatchGAN (SN-PatchGAN) algorithm is used in the discriminator network.The discriminator network comprises five standard convolutions with a convolution kernel size of 5×5 and a step size of 2. Since the SN-PatchGAN discriminator network aims to perform feature extraction and true-false discrimination for each image block, the local textural details of the mural can be learned and enhanced by this network.
Loss function.The loss functions used in this paper include the adversarial loss, pixel reconstruction loss, perceptual loss and local edge loss.
The adversarial loss can improve the visual quality of the generated image and is often used in image generation [25] and image style transfer [26] tasks.In addition, the adversarial loss aims to continuously optimize the generator and discriminator to improve the detailed information of the generated image.The adversarial loss can be expressed as follows: where P data (I gt ) is the distribution of the real image, and P miss (I gt ) is the distribution of the repaired image.The generator aims to minimize the results as much as possible, while the discriminator aims to maximize the results as much as possible.In this way, the model can be continuously optimized.
The pixel reconstruction loss is used to measure the pixel differences between the generated mural I pred and the real mural I gt , which can be expressed as follows: In the perceptual loss function, the features obtained by the convolutions are compared with those of the real image.This loss function can be used to measure the high-level semantic similarity between images [27] and effectively improve the structure of the repaired image.The perceptual loss can be expressed as follows: where ϕ i is the l-layer feature map of image I extracted from the pooling layer of the VGG-16 [28] network pretrained based on the ImageNet [29] dataset, and h l , w l and c l are the length, width and number of channels for ϕ i (I), respectively.The local edge loss (L edge ) is a new loss function proposed in this paper to measure the pixel difference between the structural information of the generated region and the sketch.This loss function is explained in detail in the Innovation section.
To focus each generator on the tasks that need more attention in each stage, the two generators are separately trained.In the coarse repair stage, reasonable structural information is generated based on the sketch information, and the correlation between the defect and known areas is determined.The loss function for this stage can be expressed as follows: where α is a parameter used to focus on certain areas in the image, which addresses issues such as incompleteness and errors in the restored images.
The fine repair phase aims to generate finer and more realistic texture information than that generated in the coarse repair stage.The loss function for this stage can be expressed as follows:

Improvements
The local edge loss function L edge .As the network depth increases, the network focuses more on the details of the image and less on the structural information contained in the sketch, which reduces the guiding role of the sketch.Moreover, the generated structural information is blurred when only gated convolutions are used.Therefore, a local edge loss L edge is proposed in this paper to further constrain the network repair process.In the proposed loss function, I gt denotes the intact mural image, and I s denotes the edge detection map of I gt , where 1 represents an edge position and 0 represents a nonedge position.Moreover, I m is the normalized mask map, with 0 denoting missing areas and 1 denoting known areas.I s and I m are dot-multiplied to obtain sketch E, which denotes the user-provided guidance structural information for the missing area.Then, the Canny edge detection algorithm is used to extract structural information I f from the coarse-stage generation result to obtain the structural information graph Y.The proportion of correct pixel points in Y considering the user-guided information is calculated as the edge loss: where Y i,j is the predicted structural information for pixel (i, j), and E i,j is the true value of the structural information for pixel (i, j) in the missing area.The numerator represents the number of edge pixels with matched structural information between the generated result and userguided structure, and the denominator represents the number of edge pixels used to guide the structural information.The loss function ranges from 0 to 1.For each pixel in the user guidance information area, the predicted value should be as close as possible to the true value of the edge image.If a pixel is present in the nonuser guidance information area, we do not consider the corresponding loss to prevent imposing unnecessary restrictions on the nonuser guidance area in the repair process.
The edge loss function and the original repair loss function are weighted and summed to obtain the total loss function.Then, gradient descent and other optimization methods are used to minimize the total loss function, and finally, the optimal repair result is obtained.
Proposed SCA mechanism.The SCA mechanism proposed in this paper includes three parts: attention calculation, attention propagation and feature reconstruction.Regarding the attention calculation, the existing CA and CSA methods obtain the attention score by calculating the similarity between the coarse repair result of the missing area and the cosine of the true texture of the known area, which imposes high constraints on the one-stage repair results.However, when the coarse inpainting result differs from the real texture, the calculated similarity does not accurately represent the correlation between the missing and known regions, which may lead to texture deviations in the final repaired result.To address these problems, the SCA module is proposed, as shown in Fig 3 .F g , with a size of 32×32×192, is the feature map of I fg after processing by two convolutions and nearest neighbor interpolation.F p , which also has a size of 32×32×192, is the feature map of I fp generated after processing by two convolutions and nearest neighbor interpolation.The torch.nn.Unfold () function is utilized to extract F g to obtain information block P, with a size of 3×3×192×1024, and the cosine similarity between the missing region and all known regions is calculated as follows: where p l i;j is the first feature block with a size of 3×3×192 extracted from feature map F g , with (i, j) as the center point, C i,j is the feature block extracted from F g through a sliding window during the convolution, with (i, j) as the center point, and S is the attention map with a size of 32×32×1024, which contains the similarity information of all the feature blocks.The superscript l denotes the lth layer.After the attention score is calculated, it is propagated and updated by the SCAP module, and a channel softmax normalization operation is performed to obtain the final attention score S'.The channel normalization operation is formulated is as follows: In the feature reconstruction part, F p is divided into blocks, and the area to be repaired is set to 0 using the mask I m' to obtain a feature block that contains only the known areas, which has a size of 3×3×1024.This feature block is used as a convolutional kernel to deconvolute S' to obtain the reconstructed feature map F r , with a size of 32×32×192.The SCAP module.Unreasonable repair results during the coarse repair phase could affect the block similarity score, which could impact the final repair results.Fig 4C shows the coarse repair result.The coarse repair error in the red box region leads to increased focus on the information in the left and right regions, rather than that in the lower region, in the fine repair stage, resulting in an unreasonable final output.To address this problem, the proposed method includes an edge-constrained attention propagation module after the attention score is calculated.It is assumed that objects in the same closed area exhibit similar textures, and the attention score is updated according to edge information constraints to improve the rationality and continuity of the generated texture and improve the final repair effect.Using right propagation as an example, the similarity score can be obtained as follows: where S x,y,x',y' denotes the similarity score between P x,y and P x',y' , and m denotes the edge information graph, which contains only two values, 0 and 1.If the value is 0, the position contains edge information.When an edge is encountered, the value on the right side is no longer considered by the model.A framework diagram of the SCAP module is shown in Fig 5 .The input to the model is the original attention score, and the output is the updated attention score.In the loop update module, the attention score is updated by using the surrounding information within the same closed interval.I s denotes a matrix with the same size as the attention score obtained by sampling via bilinear interpolation of the structural information graph, where edge structure positions are denoted by 0, and nonedge information is denoted by 1.The attention score is stratified through element-by-element point multiplication with I s , and the new attention score is convolved with the convolution kernel in the graph.This convolution can be combined with the attention score contribution of the information around the defect area to update the attention score.The I s function reduces the influence of edge information and surrounding nonsimilar objects during the convolution process, and the parameter i controls the scope of the surrounding areas in the attention update process.The selection control module combines the initial attention score of the edge position with the updated attention score of the nonedge position to update the score for only the nonedge position.Finally, the softmax function is used to obtain the final attention score.In this paper, multiple ablation experiments with different numbers of cycles are performed based on a mural dataset.The experimental results are shown in Table 2.

Algorithm steps and processes
The specific training process of the two-stage mural restoration algorithm based on the edgeconstrained attention mechanism is shown in Algorithm 1.

Experimental environment and design
The experimental environment was based on the Windows 10 operating system, and PyTorch version 3.9.0 was used for network training.The computer was configured with an Intel (R) Core (TM) i7-11800H CPU, with 16 GB of memory.The graphics card was an NVIDIA GeForce�GTX 3060 card with 8 GB of memory.The CUDA version was 11.0, and the Adam optimizer was used for training.The training process was divided into three stages.First, the one-stage generator and discriminator were trained.The batch size was set to 4, the number of epochs was set to 100, each round included 700 iterations, and the learning rate was set to 0.001.Then, the one-stage generator was updated, and the two-stage generator and discriminator were trained.In this process, the batch size was set to 4, the number of epochs was set to 100, each round included 700 iterations, and the learning rate was set to 0.0005.Finally, the two generators and the second-stage discriminator were combined for training.The batch size was set to 4, the number of epochs was set to 20, each round included 700 iterations, and the learning rate was set to 0.0002.There are few complete or well-preserved murals that can be used to train deep neural models because of the large damaged areas in real murals.In our study, in addition to real murals, we collected mural replicas provided by artists.The images of the real murals were obtained with a digital camera, and the mural replicas were obtained through the Complete Collection of Dunhuang Murals in China and Tomb Murals of the Silk Road in China.Our dataset included 1519 physical murals and 1295 replicas, for a total of 2814 images.To ensure that the model fully learned the relationship between the structural information and the images, we randomly cropped the images in the dataset and generated masks for data enhancement during training.Most of the structural information graphs for training and testing were assessed with the Canny edge algorithm, where the low threshold was set to 80, the high threshold was set to 240, a 3×3 Gaussian filter was used as the Sobel operator, and the binary threshold was set to 0.3.In addition, some of the sketches used in this study were manually drawn.

Experimental analysis and comparison
To better verify the advanced nature and application value of this model, five analyses are presented in this paper: a quantitative comparative analysis, a qualitative comparative analysis, a comparative analysis of the structure repair results for different sketches, a module effectiveness analysis, and a comparative analysis of real repairs for damaged murals.To better display the structural information in the sketches, the sketches shown in the following experimental results were obtained by inverting the pixel values of the edge detection image.

Quantitative comparative analysis
To verify the validity and versatility of the proposed model, comparison experiments are performed with datasets with four mask ratios of 10-20%, 20-30%, 30-40% and 40-50% [30].In this paper, the peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and image similarity (mean square error, MSE) are used to evaluate the pixel-level difference, overall similarity, and minimization error, respectively, between the repaired and original images.The higher the PSNR and SSIM are, the better the repair effect, and the lower the MSE is, the better the repair effect [31,32].Table 3 shows the quantitative comparison results.The proposed method ranked among the top two approaches in terms of the PSNR, SSIM and MSE metrics in the experiments with different mask ratios.These results show that the proposed method provides a better repair effect in terms of contrast, structure and quality.In addition, the experimental data reveal that the higher the mask ratio is, the larger the SSIM difference between the proposed model and the other models.Moreover, our model obtains the best PSNR.This demonstrates that compared with the other methods, the proposed method provides better structural repair effects in the restoration of murals with large damaged areas while ensuring high image quality.Compared with the RFR model, the Lama method generates both realistic textures and visually reasonable structural information for most images.Nevertheless, there are some differences between the structures generated by the Lama method and those of the real images (e.g., the deer leg shown in Fig 6-4(G)).The DeepFillV2 model uses sketch guidance to generate the desired information.However, the DeepFillV2 model uses a contextual attention mechanism in the second repair stage.Thus, when this method is used to repair mural images with rich colors, the repaired result significantly differs The proposed method generates accurate and clear structures, and the generated texture is essentially consistent with the real texture.These results verify that the proposed method can improve the inpainting effect for mural images.

Comparative analysis of the repair results with different sketch structures
To verify the importance of sketch information in the repair process and evaluate the repair effect of the model guided by different sketch structures, a comparative experiment is performed in this paper according to the presence or absence of sketches and the use of different structural sketches.However, the texture quality generated by the DeepFillV2 model considerably differs from the real texture, while the results of the SC-FEGAN model contain visual artifacts.For complex sketch structures, the SC-FEGAN model produces structural errors in the repaired regions, while the DeepFillV2 model generates incomplete structural information.Although SketchRefiner can generate complete structures according to the sketch information, the method proposed in this study achieves a better effect: the generated structure information is clearer and more accurate, with a better texture repair effect.

Module effectiveness analysis
We also conducted three decomposition experiments based on the mural dataset to verify the effectiveness of our proposed modules.The image restoration results of all the experiments are randomly generated masks.repair result obtained using the CA method significantly deviates from the real texture, whereas this texture deviation is hardly visible in the repair result obtained using the SCA method.This shows that the SCA method achieves better texture reconstruction ability based on the mural dataset than previous approaches.

Repair analysis of real damaged murals
To further verify the feasibility of the proposed model, four groups of real damaged mural images are analyzed via repair experiments.We marked the defect area, extracted the structural information of the nondefect area, and manually determined the structure of the defect area.The repair results are shown in Fig 11 .The proposed method generates a reasonable structure according to the sketch information, and the generated texture and color are consistent with those of the original mural.The repair effect of the proposed method is better than that of the other three mainstream methods.Therefore, the proposed method has considerable application value in mural image restoration.Due to the lack of corresponding standard reference images for real damaged murals, a nonreference image quality assessment method is employed to evaluate the repair results.The mean opinion score (MOS) without a reference evaluation index is used.The MOS is a quality score representing the observer's judgment of the structural and textural repair effects according to set evaluation criteria [29,30].The higher the MOS value is, the better the repair effect.The corresponding relationships between the MOS value and the restoration effect are summarized in Table 4.The results of the real damaged mural restoration experiment shown in The MOS evaluation results for the structure and texture of the images generated by the proposed method are better than those of the images generated by the comparison methods, indicating that the proposed method can generate reasonable structural information and textures that are more consistent with the real textures.The findings suggest that the proposed method outperforms the comparison methods in terms of both subjective and objective evaluation indices, thus verifying the effectiveness and suitability of the proposed method for the restoration of real murals.

Conclusion
In this study, a two-stage mural image restoration model based on an edge-constrained attention mechanism is designed.In this model, gated convolutions combined with a local edge loss function are introduced in the coarse repair stage to improve the guidance ability of the sketches in the feature extraction process considering network constraints and the consistency  between the structural information of the user-guided sketch and that of the generated image.
In the fine repair stage, an edge-constrained attention propagation module is introduced, which calculates attention based on both known and missing regions.This treatment enables the obtained attention score to better reflect the similarity degree between the missing and the know regions and thus enhances the local consistency between the generated and the know textures.The integrated edge-constrained attention propagation algorithm, following the attention mechanism, utilizes the structural information of the sketch to propagate and update the attention score and enhances the semantic relevance between the generated results and the known areas.This study leads to the following main findings: 1. Across various mask ratios, our proposed method consistently outperforms most compared methods in terms of PSNR and SSIM metrics.Specifically, at mask ratios of 10-20%, 20-30%, 30-40%, and 40-50%, the PSNR values achieved by our method are 36.26,30.11, 26.00, and 24.22 dB, respectively.These values surpass those obtained by DeepFillV2, RFR, SC-FEGAN, and Lama.Additionally, our method demonstrates superior performance in the SSIM metric across all mask ratios, achieving 0.9872, 0.9753, 0.9510, and 0.9395, respectively.
2. When guided by user sketches, our proposed method effectively produces visually complete inpainting results that maintain structural consistency with the sketches.This finding demonstrates the method's ability to leverage user-provided prior knowledge, ensuring the generated inpainting images meet user expectations.
3. When applied to real-world damaged mural images, our proposed method consistently outperforms competing methods based on subjective evaluation scores.This finding demonstrates the method's effectiveness and robustness in addressing real-world damage, delivering visually satisfactory inpainting results.
The network proposed in this paper can generate results that satisfy users expectations according to provided sketches and the repaired and the known regions exhibit a high level of texture consistency, demonstrating a novel and practical mural repair method.However, this network also has limitations.The proposed attention propagation module uses only the information around the same object to propagate updates and does not consider the information of similar objects, such as repeated patterns and decorations in clothing.Particularly, when patterns and decorations are missing, the final repair result may exhibit texture errors.To further improve the effectiveness of the proposed model, multiscale information should be considered in the attention module to better capture the correlations between similar objects.Furthermore, the attention propagation algorithm can be further optimized to propagate information based on distant but similar objects.Additionally, mural datasets can be enriched to enable the network to learn more semantic information in the coarse repair stage.

Fig 1 .
Fig 1. Overall design of the structure-constrained attention network (SCAnet).https://doi.org/10.1371/journal.pone.0307811.g001 Fig 2 shows I fp , with the red frame representing the generated result and the yellow frame representing the known true textual information.The attention score obtained by calculations based on I fp do not accurately reflect the real relationship between the two regions.In contrast, the red and yellow regions in I fg are generated by G1; therefore, the attention score calculated based on I fg accurately represents the relationship between the missing and known regions.

Require:Fig 5 .
Fig 5. SCAP structure.https://doi.org/10.1371/journal.pone.0307811.g005 To subjectively evaluate the repair effect of the proposed model, qualitative experiments are conducted on the basis of the sketch obtained by the Canny edge detection algorithm.The comparison results are shown in Fig 6.The RFR model obtains realistic texture details via circular feature inference; however, this method does not consider the guiding effect of the sketches, which causes the results to deviate from the original structure.Although there are no obvious visual errors in the silk image in Fig 6-1(E), the deer leg image in Fig 6-4(E), the auspicious cloud image in Fig 6-5(E), and the screw bun image in Fig 6-6(E), the generated structures greatly differ from the real structure.

Fig 6 .
Fig 6.Comparison of the mural restoration results obtained by the different models.https://doi.org/10.1371/journal.pone.0307811.g006 Fig 7 shows the different parts of sketches 1 and 2, which were manually drawn.Fig 7B shows that without the guidance of the sketch structure, the four repair methods can obtain suitable results in unstructured regions with the same surrounding texture, but the results are blurred at the boundary between regions with different textures.Thus, sketches are crucial for the structural restoration of mural images.Fig 7D and 7F reveal that the DeepFillV2 and SC-FEGAN models both generate simple structures based on the sketch information.

1 . 2 .
Effectiveness of the L edge function Fig 8A shows the input image to be repaired, Fig 8B shows the structure sketch extracted from the original image, Fig 8C shows the image inpainting effect without the L edge loss function, Fig 8D shows the image inpainting effect with the L edge loss function, and Fig 8E shows the original image.The comparative analysis demonstrates that the L edge function proposed in this paper improves the structural reconstruction ability of the proposed model.Effectiveness of the SCA module Fig 9A shows the input image to be repaired, Fig 9B shows the structure sketch extracted from the original image, Fig 9C shows the image inpainting result of the mainstream CA algorithm, Fig 9D shows the image inpainting result of the SCA-based image inpainting module proposed in this paper, and Fig 9E shows the original image.The texture of the

3 .
Effectiveness of the SCAP module Fig 10A shows the input image to be repaired, Fig 10B shows the structure sketch extracted from the original image, Fig 10C shows the image restoration effect without the SCAP module, Fig 10D shows the image restoration effect with the SCAP module, and Fig 10E shows the original image.There are color errors in the repair results of the network without the SCAP module, while the network with the SCAP module propagates and updates the attention scores between similar objects and thus generates the correct color.

Fig 12 .
Fig 12.Comparison of the subjective and objective evaluation scores of the restoration results for real damaged murals.(a) shows the MOS value of the similarity between the structure and sketch, and (b) shows the MOS value of the texture restoration result.https://doi.org/10.1371/journal.pone.0307811.g012